Application of Statistical Learning Theory to DNA Microarray Analysis by
نویسندگان
چکیده
This thesis focuses on applying Support Vector Machines SVMs an algorithm founded in the framework of statistical learning theory to analyzing DNA microarray data The rst part of the thesis focuses on extensions to SVMs required for analyzing microarray data First the problem of choosing multiple parameters at once for SVMs is addressed This is used as the basis of a feature selection algorithm that allows us to select which genes are most relevant in discriminating between two classes A methodology for outputting con dence levels as well as class labels is developed The second part of the thesis consists of a systematic evaluation of a variety of machine learning algorithms on ve datasets from four types of molecular cancer classi cation problems It will also describe some very promising results in predicting treatment outcome from expression data for brain tumors and lymphoma The al gorithms compared will be k Nearest Neighbors kNN Naive Bayes NB Weighted Voting Average WV and Support Vector Machines SVMs Learning curves are constructed for the lymphoma treatment and morphology datasets to compare per formance as a function of sample size and try to address the questions of given enough data can error rates that are clinically acceptable be achieved and how much data is needed to achieve such a rate A simple analytic model is constructed to estimate the variance in classi cation accuracy due to sample size limitations Thesis Supervisor Tomaso Poggio Title Uncas and Helen Whitaker Professor of Brain Sciences
منابع مشابه
Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملIntegration and Reduction of Microarray Gene Expressions Using an Information Theory Approach
The DNA microarray is an important technique that allows researchers to analyze many gene expression data in parallel. Although the data can be more significant if they come out of separate experiments, one of the most challenging phases in the microarray context is the integration of separate expression level datasets that have gathered through different techniques. In this paper, we prese...
متن کاملبه کارگیری روشهای خوشهبندی در ریزآرایه DNA
Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...
متن کاملDeveloping a Filter-Wrapper Feature Selection Method and its Application in Dimension Reduction of Gen Expression
Nowadays, increasing the volume of data and the number of attributes in the dataset has reduced the accuracy of the learning algorithm and the computational complexity. A dimensionality reduction method is a feature selection method, which is done through filtering and wrapping. The wrapper methods are more accurate than filter ones but perform faster and have a less computational burden. With ...
متن کاملPredicting CpG Islands and DNA Methlation in the Cow Genome Using DNA Microarray Meta-Analysis and Genome Wide Scanning
DNA methylation is a type of epigenetic changes that directly affects DNA. In mammals, DNA methylation is essential for fetal development and stem cell differentiation and this phenomenon essentially occurs within the CpG islands. In this study, two methods were used to study the DNA methylation profile of cow genome. In the first method, the DNA methylation profile of the differentially expres...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001